- These codes parse the .xml files downloaded from "https://bulkdata.uspto.gov/" "Patent Grant Full Text Data (No Images)"
- You can simply convert the .csv files into .dta files in STATA using the "insheet" or "import delimited" command (see "csv2dta_sample.do" for examples) 
- These codes are optimized for the .xml files released in 2016-2017. You might have to change some lines by looking at the .xml files for different periods as the location of items to be extracted can be different among the lines. 

1. basic.py
- Extracts "wku", "isd", "apd", "nam_assg" which are patent application number, issue date, application date, and name of the first assignee in order. 

2. assg_location.py
- Extracts "wku", "asscode", "cnt", "sta", "city" which are patent application number, assigne type code given by USPTO, country of the assignee, state of the assignee, and city of the assignee in order. 

3. cite.py
- Extracts "citing", "cited" which are the application numbers of the citing patent and cited patent, respectively. 

4. invent.py 
- Extracts "wku", "nam_invt", "cnt", "sta", "city" which are patent application number, name of the inventor, country of the inventor, state of the inventor, and city of the inventor in order. 

5. ipc folder  
- Extracts "wku", "ipc" which are patent application number and the full IPC (international Patent Classification) given to that patent. 
- Different codes are for different periods as the xml files slightly differ across the periods 
- 15q1.py and 15.4 are just for the purpose of increasing the speed of parsing (not because of the change of xml files) 
- After parsing the xml files, the results are converted into '.dta' files and integrated into a single file 
- Only the main classification is extracted before 2006.
- After 2006, all IPCs are extracted, but "ipc.dta" file of the database only includes the first IPC of the patents. 

6. csv2dta_sample.do 
- An example code to change csv files into dta format and keeping data of utility patents. 

